Question Similarity Measurement of Chinese Crop Diseases and Insect Pests Based on Mixed Information Extraction

Han Zhou; Xuchao Guo; Chengqi Liu; Zhan Tang; Shuhan Lu; Lin Li

연구문헌

영문 논문지

홈 > 연구문헌 > 영문 논문지 > TIIS (한국인터넷정보학회)

TIIS (한국인터넷정보학회)

Current Result Document :

한글제목(Korean Title)	Question Similarity Measurement of Chinese Crop Diseases and Insect Pests Based on Mixed Information Extraction
영문제목(English Title)	Question Similarity Measurement of Chinese Crop Diseases and Insect Pests Based on Mixed Information Extraction
저자(Author)	Han Zhou Xuchao Guo Chengqi Liu Zhan Tang Shuhan Lu Lin Li
원문수록처(Citation)	VOL 15 NO. 11 PP. 3991 ~ 4010 (2021. 11)
한글내용 (Korean Abstract)
영문내용 (English Abstract)	The Question Similarity Measurement of Chinese Crop Diseases and Insect Pests (QSM-CCD&IP) aims to judge the user’s tendency to ask questions regarding input problems. The measurement is the basis of the Agricultural Knowledge Question and Answering (Q & A) system, information retrieval, and other tasks. However, the corpus and measurement methods available in this field have some deficiencies. In addition, error propagation may occur when the word boundary features and local context information are ignored when the general method embeds sentences. Hence, these factors make the task challenging. To solve the above problems and tackle the Question Similarity Measurement task in this work, a corpus on Chinese crop diseases and insect pests (CCDIP), which contains 13 categories, was established. Then, taking the CCDIP as the research object, this study proposes a Chinese agricultural text similarity matching model, namely, the AgrCQS. This model is based on mixed information extraction. Specifically, the hybrid embedding layer can enrich character information and improve the recognition ability of the model on the word boundary. The multi-scale local information can be extracted by multi-core convolutional neural network based on multi-weight (MM-CNN). The self-attention mechanism can enhance the fusion ability of the model on global information. In this research, the performance of the AgrCQS on the CCDIP is verified, and three benchmark datasets, namely, AFQMC, LCQMC, and BQ, are used. The accuracy rates are 93.92%, 74.42%, 86.35%, and 83.05%, respectively, which are higher than that of baseline systems without using any external knowledge. Additionally, the proposed method module can be extracted separately and applied to other models, thus providing reference for related research.
키워드(Keyword)	Text semantic similarity Short text-similarity Agricultural natural language processing Chinese word segmentation
파일첨부	PDF 다운로드